Goto

Collaborating Authors

 appropriate response


Fine-Tuning DialoGPT on Common Diseases in Rural Nepal for Medical Conversations

Poudel, Birat, Ghimire, Satyam, Prasad, Er. Prakash Chandra

arXiv.org Artificial Intelligence

Conversational agents are increasingly being explored to support healthcare delivery, particularly in resource-constrained settings such as rural Nepal. Large-scale conversational models typically rely on internet connectivity and cloud infrastructure, which may not be accessible in rural areas. In this study, we fine-tuned DialoGPT, a lightweight generative dialogue model that can operate offline, on a synthetically constructed dataset of doctor-patient interactions covering ten common diseases prevalent in rural Nepal, including common cold, seasonal fever, diarrhea, typhoid fever, gastritis, food poisoning, malaria, dengue fever, tuberculosis, and pneumonia. Despite being trained on a limited, domain-specific dataset, the fine-tuned model produced coherent, contextually relevant, and medically appropriate responses, demonstrating an understanding of symptoms, disease context, and empathetic communication. These results highlight the adaptability of compact, offline-capable dialogue models and the effectiveness of targeted datasets for domain adaptation in low-resource healthcare environments, offering promising directions for future rural medical conversational AI.


Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation

Kim, Namhee, Park, Woojin

arXiv.org Artificial Intelligence

Traditional autonomous driving systems often struggle with reasoning in complex, unexpected scenarios due to limited comprehension of spatial relationships. In response, this study introduces a Large Language Model (LLM)-based Autonomous Driving (AD) assistance system that integrates a vision adapter and an LLM reasoning module to enhance visual understanding and decision-making. The vision adapter, combining YOLOv4 and Vision Transformer (ViT), extracts comprehensive visual features, while GPT-4 enables human-like spatial reasoning and response generation. Experimental evaluations with 45 experienced drivers revealed that the system closely mirrors human performance in describing situations and moderately aligns with human decisions in generating appropriate responses.


TheraGen: Therapy for Every Generation

Doshi, Kartikey, Shah, Jimit, Shekokar, Narendra

arXiv.org Artificial Intelligence

We present TheraGen, an advanced AI-powered mental health chatbot utilizing the LLaMA 2 7B model. This approach builds upon recent advancements in language models and transformer architectures. TheraGen provides all-day personalized, compassionate mental health care by leveraging a large dataset of 1 million conversational entries, combining anonymized therapy transcripts, online mental health discussions, and psychological literature, including APA resources. Our implementation employs transfer learning, fine-tuning, and advanced training techniques to optimize performance. TheraGen offers a user-friendly interface for seamless interaction, providing empathetic responses and evidence-based coping strategies. Evaluation results demonstrate high user satisfaction rates, with 94% of users reporting improved mental well-being. The system achieved a BLEU score of 0.67 and a ROUGE score of 0.62, indicating strong response accuracy. With an average response time of 1395 milliseconds, TheraGen ensures real-time, efficient support. While not a replacement for professional therapy, TheraGen serves as a valuable complementary tool, significantly improving user well-being and addressing the accessibility gap in mental health treatments. This paper details TheraGen's architecture, training methodology, ethical considerations, and future directions, contributing to the growing field of AI-assisted mental healthcare and offering a scalable solution to the pressing need for mental health support.


Expanding the Set of Pragmatic Considerations in Conversational AI

Seals, S. M., Shalin, Valerie L.

arXiv.org Artificial Intelligence

Despite considerable performance improvements, current conversational AI systems often fail to meet user expectations. We discuss several pragmatic limitations of current conversational AI systems. We illustrate pragmatic limitations with examples that are syntactically appropriate, but have clear pragmatic deficiencies. We label our complaints as "Turing Test Triggers" (TTTs) as they indicate where current conversational AI systems fall short compared to human behavior. We develop a taxonomy of pragmatic considerations intended to identify what pragmatic competencies a conversational AI system requires and discuss implications for the design and evaluation of conversational AI systems.


Dialogue Systems Can Generate Appropriate Responses without the Use of Question Marks? -- Investigation of the Effects of Question Marks on Dialogue Systems

Mizumoto, Tomoya, Yamazaki, Takato, Yoshikawa, Katsumasa, Ohagi, Masaya, Kawamoto, Toshiki, Sato, Toshinori

arXiv.org Artificial Intelligence

When individuals engage in spoken discourse, various phenomena can be observed that differ from those that are apparent in text-based conversation. While written communication commonly uses a question mark to denote a query, in spoken discourse, queries are frequently indicated by a rising intonation at the end of a sentence. However, numerous speech recognition engines do not append a question mark to recognized queries, presenting a challenge when creating a spoken dialogue system. Specifically, the absence of a question mark at the end of a sentence can impede the generation of appropriate responses to queries in spoken dialogue systems. Hence, we investigate the impact of question marks on dialogue systems, with the results showing that they have a significant impact. Moreover, we analyze specific examples in an effort to determine which types of utterances have the impact on dialogue systems.


ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Dialogue Generation

Zhang, Bo, Wang, Jian, Ma, Hui, Xu, Bo, Lin, Hongfei

arXiv.org Artificial Intelligence

Image-grounded dialogue systems benefit greatly from integrating visual information, resulting in high-quality response generation. However, current models struggle to effectively utilize such information in zero-resource scenarios, mainly due to the disparity between image and text modalities. To overcome this challenge, we propose an innovative multimodal framework, called ZRIGF, which assimilates image-grounded information for dialogue generation in zero-resource situations. ZRIGF implements a two-stage learning strategy, comprising contrastive pre-training and generative pre-training. Contrastive pre-training includes a text-image matching module that maps images and texts into a unified encoded vector space, along with a text-assisted masked image modeling module that preserves pre-training visual features and fosters further multimodal feature alignment. Generative pre-training employs a multimodal fusion module and an information transfer module to produce insightful responses based on harmonized multimodal representations. Comprehensive experiments conducted on both text-based and image-grounded dialogue datasets demonstrate ZRIGF's efficacy in generating contextually pertinent and informative responses. Furthermore, we adopt a fully zero-resource scenario in the image-grounded dialogue dataset to demonstrate our framework's robust generalization capabilities in novel domains. The code is available at https://github.com/zhangbo-nlp/ZRIGF.


Stay on topic with Classifier-Free Guidance

Sanchez, Guillaume, Fan, Honglu, Spangher, Alexander, Levi, Elad, Ammanamanchi, Pawan Sasanka, Biderman, Stella

arXiv.org Artificial Intelligence

Classifier-Free Guidance (CFG) [37] has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks: Q&A, reasoning, code generation, and machine translation, achieving SOTA on LAMBADA with LLaMA-7B over PaLM-540B; (2) brings improvements equivalent to a model with twice the parameter-count; (3) can stack alongside other inference-time methods like Chain-of-Thought and Self-Consistency, yielding further improvements in difficult tasks; (4) can be used to increase the faithfulness and coherence of assistants in challenging form-driven and content-driven prompts: in a human evaluation we show a 75% preference for GPT4All using CFG over baseline.


Affiliate Network Programs: Is ChatGPT the Future of AI? A Comprehensive Review

#artificialintelligence

AI (Artificial Intelligence) refers to the creation of intelligent machines that can perform tasks that would normally require human intelligence, such as problem-solving, decision-making, natural language processing, and more. ChatGPT is a language model created by OpenAI that uses deep learning to generate human-like responses to text-based queries. It is trained on a massive dataset of text, enabling it to understand and generate human-like language. ChatGPT is designed to have natural conversations with users, providing information and assistance in a variety of domains. As an AI language model, ChatGPT is quite good at generating human-like text, but its performance can vary depending on the specific task and the context in which it is used.


Guide to ChatGPT - by Alex McFarland - AI Disruption

#artificialintelligence

So this week, I wanted to give you guys something a little different. The weekly AI Disruption will continue next week. I hope this guide proves useful! One of the hottest topics in the field of AI right now is ChatGPT, short for chat-based Generative Pre-trained Transformer. This powerful tool is being used across industries and for many use cases.


How AI is transforming chat channels?

#artificialintelligence

AI is used in chat channels to assist with tasks such as customer service, order fulfillment, and product research. For example, customer service can use AI to answer customer questions, identify customer needs, and make recommendations. AI can also be used to monitor chat channels for problem keywords and phrases and automatically respond with appropriate solutions. Conversational AI is the process of using machine learning and deep neural networks to enable users to communicate with computer systems in natural language. The system extracts user intent from text or voice input and transforms the text into structured data.